Skip to content

feat(support): add support service with WebSockets and Yamux#47

Open
edospadoni wants to merge 28 commits intomainfrom
feature/support-service
Open

feat(support): add support service with WebSockets and Yamux#47
edospadoni wants to merge 28 commits intomainfrom
feature/support-service

Conversation

@edospadoni
Copy link
Copy Markdown
Member

@edospadoni edospadoni commented Mar 10, 2026

Support Service — Architecture

How it works

A tunnel client on the customer's system opens a persistent WebSocket to our support service. The connection is multiplexed with yamux — one WebSocket carries many parallel streams. When an operator clicks "Open" in the UI, traffic flows through the tunnel to reach the remote service (web UI, terminal, API) as if it were local.

graph LR
    subgraph Customer System
        TC[tunnel-client<br/>yamux mux] --> WU[Web UI]
        TC --> SA[SSH/API]
        TC --> ETC[...]
    end

    TC ---|WebSocket<br/>single connection| SS

    BR[Browser<br/>operator] --> NG[nginx<br/>proxy]
    NG --> BE[Backend :8080<br/>sessions, auth]
    BE --> SS[Support :8082<br/>tunnels, yamux]
Loading

Session Lifecycle

stateDiagram-v2
    [*] --> pending
    pending --> active : WebSocket established
    active --> closed : operator closes
    active --> grace_period : disconnect
    grace_period --> active : reconnect (same session)
    grace_period --> expired : timeout (30-60s)
Loading

WebSocket + yamux Multiplexing

The tunnel client opens one WebSocket to the support service. On top of it, yamux creates a multiplexed session — like having many TCP connections inside a single one.

WebSocket connection (single, persistent)
|
+-- yamux session
    |
    +-- stream #0  [manifest]     client → server: service list (JSON)
    +-- stream #1  [diagnostics]  client → server: health report (JSON)
    +-- stream #2  [users]        client → server: ephemeral credentials report (JSON)
    +-- stream #3  [COMMAND]      server → client: add_services / remove_services
    +-- stream #4  [HTTP proxy]   operator browses NethVoice UI
    +-- stream #5  [terminal]     operator opens xterm.js shell
    +-- ...        (up to 64 concurrent streams per tunnel)

How it connects:

  1. Tunnel client sends GET /support/api/tunnel with HTTP Basic Auth
  2. Support service upgrades to WebSocket, wraps it as net.Conn
  3. yamux.Server is created over the wrapped connection (keepalive 15s)
  4. Client opens a manifest stream with a JSON service list — the reachable services on the remote system
  5. Client opens a diagnostics stream with a health snapshot (CPU, RAM, disk, uptime + plugin results)
  6. Client provisions ephemeral support users and opens a users stream to report credentials
  7. Each proxied request from an operator opens a new yamux stream, forwarded to the target service on the customer system

Server-initiated streams: the support service can also open streams toward the tunnel-client. These start with a COMMAND <version>\n header and carry a JSON payload. The tunnel-client processes the command and responds OK\n or ERROR <msg>\n.

On disconnect: the tunnel enters a grace period (30-60s). If the client reconnects, the same session is reused. If the grace expires, the session is closed and ephemeral credentials are wiped from the database.


Ephemeral User Provisioning

When the tunnel-client connects, it provisions temporary support users on the managed system:

Platform What is created Privileges
NS8 (NethServer) Cluster-admin user + domain users (one per LDAP/Samba domain) Cluster owner role (*)
NethSecurity Local user promoted to admin Full web UI admin access

Lifecycle:

  1. On connect: tunnel-client creates users, stores credentials in a local state file (/var/run/my-support-users.json), reports them to the server via yamux USERS_REPORT stream
  2. On disconnect/session end: tunnel-client deletes all ephemeral users and removes the state file
  3. On crash recovery: next startup reads the orphaned state file, runs cleanup (delete users + teardown plugins), then removes the file

After user provisioning, users.d/ plugins configure applications to accept the support credentials (e.g., creating FreePBX admin entries for NethVoice).

Plugin system details: see Plugin Systems: diagnostics.d/ and users.d/ for the complete developer reference on writing diagnostic and user configuration plugins.


Static Service Injection

Operators can add arbitrary host:port services to a running tunnel without reconnection. This is useful for services not auto-discovered via Traefik — for example the web management interface of a device on the customer's LAN (IP phone, managed switch, NAS, etc.).

Operator clicks "Add service" → fills in name, target (host:port), label, TLS
  → POST /api/support-sessions/:id/services
  → Backend validates and publishes to Redis pub/sub: {action: "add_services", services: {...}}
  → Support service opens an outbound yamux COMMAND stream to the tunnel-client
  → Tunnel-client merges the new service, re-sends updated manifest
  → Support service updates its registry for that session
  → Operator can immediately open the new service via the subdomain proxy

Example: to access a Yealink phone's web UI at 192.168.1.100:443 on a customer system, add a service with target: 192.168.1.100:443, tls: true. The phone's interface becomes available at:

https://phone-yealink--<session-uuid>.support.my.nethesis.it/

…as if the operator were on the same LAN as the phone.


How the UI Proxy works (subdomain)

When an operator clicks a service link (e.g. NethVoice UI), the browser opens a new tab on a dedicated subdomain. Each service gets its own origin, so all the app's absolute paths (/_next/, /api/, /static/) work natively.

1. Frontend: POST /api/support-sessions/:id/proxy-token  {service: "nethvoice-ui"}
   Backend:  generates scoped JWT (session_id + service_name + org_role, 8h TTL)
   Response: {url: "https://nethvoice-ui--c37f4ce78b024a1fb123456789abcdef.support.example.com/", token: "ey..."}

2. Browser navigates to: https://nethvoice-ui--c37f4ce78b024a1fb123456789abcdef.support.example.com/?token=ey...
   nginx:   matches *.support.* --> rewrites to /support-proxy/* --> backend
   Backend: validates JWT, sets HttpOnly SameSite=Strict cookie, redirects to same URL without ?token=

3. All subsequent requests carry the cookie automatically:
   Browser --> nginx --> Backend (SubdomainProxy) --> Support service --> yamux stream --> Customer system

The ?token= is removed from the URL after the first request (redirect), so it never leaks in logs, referrer headers, or browser history.


How the Web Terminal works (xterm.js)

The terminal needs a WebSocket from the browser, but browsers can't send Authorization headers on WebSocket connections. Solution: one-time ticket exchanged beforehand.

1. Frontend: POST /api/support-sessions/:id/terminal-ticket  (JWT in Authorization header)
   Backend:  generates random ticket, stores in Redis with 30s TTL
   Response: {ticket: "a1b2c3..."}

2. Frontend opens WebSocket: GET /api/support-sessions/:id/terminal?ticket=a1b2c3...
   Backend:  Redis GETDEL (atomic read + delete, single-use)
             validates ticket matches session
             opens raw TCP to support service, sends WebSocket upgrade with X-Session-Token
             hijacks browser connection (http.Hijacker)
             bridges both sides bidirectionally:

   Browser (xterm.js) <--WebSocket--> Backend (TCP bridge) <--WebSocket--> Support <--yamux stream--> PTY on customer system

The tunnel client spawns a PTY (pseudo-terminal) directly on the customer system — no SSH daemon involved. The PTY output is forwarded as raw bytes through the yamux stream back to the browser's xterm.js.

Why TCP hijacking instead of httputil.ReverseProxy?

httputil.ReverseProxy can't handle WebSocket upgrades. After the 101 Switching Protocols, the connection becomes a raw bidirectional byte stream — not HTTP. The solution is http.Hijacker: take control of the raw TCP socket from Go's HTTP server. Two goroutines then io.Copy bytes in both directions (browser ↔ support service) with no HTTP overhead.


Access Patterns & Auth

Who does what Auth mechanism How it works
System → tunnel HTTP Basic Auth system_key:system_secret (SHA256), 3-tier cache (memory → Redis → DB), rate-limited
Operator → session CRUD JWT + RBAC connect:systems permission, RBAC scope verified on all operations
Operator → web terminal One-time ticket JWT exchanged for 30s Redis ticket → GETDEL on use → WebSocket via TCP hijack
Operator → UI proxy Scoped proxy JWT 8h token with {session_id, service_name, org_role} → SameSite=Strict cookie on subdomain
Backend → support service Per-session token + INTERNAL_SECRET X-Session-Token (64-char hex, per-session) + INTERNAL_SECRET (required, fail-fast at startup)

Security Highlights

🔑 No shared secrets Each session gets its own token. Compromising one doesn't affect others
🎫 Terminal ticket 30s TTL, single-use (GETDEL), JWT never touches the URL
🍪 Proxy cookie Token arrives as ?token=, stored as HttpOnly SameSite=Strict cookie, URL cleaned via redirect
Constant-time comparisons crypto/subtle for all token validations
🛡 SSRF protection Blocks cloud metadata, link-local, multicast + DNS re-validation at proxy time
🖼 Frame protection CSP frame-ancestors 'self' on proxied responses
🔄 Cache invalidation Secret regeneration → Redis pub/sub → flush memory + Redis caches instantly
📝 Audit trail Every operator action logged: who, when, what service, access type
🧹 Credential cleanup Ephemeral credentials wiped from DB on session close/expire, advisory lock prevents race conditions
🔒 INTERNAL_SECRET Required at startup (fail-fast), HMAC-signed Redis commands, no fallback
🔐 RBAC enforcement Close/Extend/Add/Remove services all verify organization scope

Inter-service Communication

Backend ──INTERNAL_SECRET────▶ Support Service    (required, HMAC-signed Redis commands)
Backend ──X-Session-Token────▶ Support Service    (per-request, per-session scope)
Backend ──Redis pub/sub──────▶ Support Service    (close, add_services, remove_services, cache invalidation)
Support ──yamux COMMAND──────▶ Tunnel Client      (server-initiated: add_services, remove_services)
Support ──yamux stream───────▶ Tunnel Client      (proxied HTTP, terminal)
Support ──WebSocket 4000─────▶ Tunnel Client      (graceful close, no reconnect)

Components & Files

Component Path Purpose
Support service services/support/ WebSocket tunnels, yamux, session DB, service proxy
Tunnel client services/support/cmd/tunnel-client/ Runs on customer system: discovery, users, diagnostics, streams
Backend APIs backend/methods/support.go, support_proxy.go Session CRUD, terminal ticket, proxy token, subdomain proxy
Frontend frontend/src/components/support/ Session dashboard, service list, terminal (xterm.js)
Proxy proxy/nginx.conf Subdomain routing, tunnel endpoint exposure
DB schema backend/database/migrations/009_*, 018_*022_* support_sessions, support_access_logs, diagnostics, users columns

Related Comments


Testing Environment

To trigger a fresh deployment of all services in the PR preview environment, comment:

update deploy

Automatic PR environments:

Merge Checklist

Code Quality:

  • Backend Tests
  • Collect Tests
  • Sync Tests
  • Frontend Tests

Builds:

  • Backend Build
  • Collect Build
  • Sync Build
  • Frontend Build

@edospadoni edospadoni deployed to feature/support-service - my-frontend-qa PR #47 March 10, 2026 08:00 — with Render Active
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 10, 2026 08:00 — with Render Active
@github-actions
Copy link
Copy Markdown
Contributor

🔗 Redirect URIs Added to Logto

The following redirect URIs have been automatically added to the Logto application configuration:

Redirect URIs:

  • https://my-proxy-qa-pr-47.onrender.com/login-redirect

Post-logout redirect URIs:

  • https://my-proxy-qa-pr-47.onrender.com/login

These will be automatically removed when the PR is closed or merged.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 10, 2026

🤖 My API structural change detected

Preview documentation

Structural change details

Added (14)

  • DELETE /support-sessions/{id}
  • GET /support-sessions
  • GET /support-sessions/diagnostics
  • GET /support-sessions/{id}
  • GET /support-sessions/{id}/diagnostics
  • GET /support-sessions/{id}/logs
  • GET /support-sessions/{id}/proxy/{service}/{path}
  • GET /support-sessions/{id}/services
  • GET /support-sessions/{id}/terminal
  • GET /support-sessions/{id}/users
  • PATCH /support-sessions/{id}/extend
  • POST /support-sessions/{id}/proxy-token
  • POST /support-sessions/{id}/services
  • POST /support-sessions/{id}/terminal-ticket

Modified (5)

  • GET /systems
    • Response modified: 200
      • Content type modified: application/json
        • Property modified: data
          • Property modified: systems
  • GET /systems/{id}
    • Response modified: 200
      • Content type modified: application/json
        • Property modified: data
          • Property added: support_session_id
  • POST /systems
    • Response modified: 201
      • Content type modified: application/json
        • Property modified: data
          • Property added: support_session_id
  • POST /systems/{id}/regenerate-secret
    • Response modified: 200
      • Content type modified: application/json
        • Property modified: data
          • Property added: support_session_id
  • PUT /systems/{id}
    • Response modified: 200
      • Content type modified: application/json
        • Property modified: data
          • Property added: support_session_id
Powered by Bump.sh

@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 10, 2026 08:11 — with Render Active
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 10, 2026 08:15 — with Render Active
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 10, 2026 08:52 — with Render Active
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 10, 2026 09:32 — with Render Active
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 10, 2026 09:39 — with Render Active
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 10, 2026 09:54 — with Render Active
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 10, 2026 10:07 — with Render Active
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 10, 2026 10:22 — with Render Active
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 10, 2026 10:31 — with Render Active
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 10, 2026 10:39 — with Render Active
@edospadoni edospadoni force-pushed the feature/support-service branch from c62b877 to 007bd6d Compare March 10, 2026 10:53
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 10, 2026 10:54 — with Render Active
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 10, 2026 13:23 — with Render Active
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 10, 2026 14:33 — with Render Active
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 11, 2026 13:09 — with Render Active
@edospadoni
Copy link
Copy Markdown
Member Author

edospadoni commented Mar 11, 2026

tunnel-client binary (linux/amd64)

Download:

tunnel-client.zip

Quick start

# Make it executable
chmod +x tunnel-client-linux-amd64

# Run it
./tunnel-client-linux-amd64 \
  --url wss://my-proxy-qa-pr-47.onrender.com/support/api/tunnel \
  --key <SYSTEM_KEY> \
  --secret <SYSTEM_SECRET>

Parameters

Flag Env var Description
-u, --url SUPPORT_URL WebSocket tunnel URL (required)
-k, --key SYSTEM_KEY System key from registration (required)
-s, --secret SYSTEM_SECRET System secret from registration (required)
-n, --node-id NODE_ID Cluster node ID, auto-detected on NS8
-r, --redis-addr REDIS_ADDR Redis address, auto-detected on NS8
--static-services STATIC_SERVICES Manual service definition: name=host:port[:tls],...
--exclude EXCLUDE_PATTERNS Comma-separated glob patterns to exclude services
--tls-insecure TLS_INSECURE Skip TLS certificate verification
--discovery-interval DISCOVERY_INTERVAL Service re-discovery interval (default 5m)
--reconnect-delay RECONNECT_DELAY Base reconnect delay (default 5s)
--max-reconnect-delay MAX_RECONNECT_DELAY Max reconnect delay (default 5m)
--diagnostics-dir DIAGNOSTICS_DIR Directory with diagnostic plugin scripts (default /usr/share/my/diagnostics.d)
--diagnostics-plugin-timeout DIAGNOSTICS_PLUGIN_TIMEOUT Timeout per diagnostic plugin (default 10s)
--diagnostics-total-timeout DIAGNOSTICS_TOTAL_TIMEOUT Max time to wait for all diagnostics before giving up (default 30s)

Service discovery modes

The tunnel-client auto-detects the environment:

  • NS8: discovers services from Redis + Traefik routes
  • NethSecurity: discovers services from OpenWrt/nginx config
  • Static: define services manually with --static-services

Diagnostics plugin system

At connect time, the tunnel-client collects a health snapshot and sends it to MY over the tunnel. Operators see the results directly in the support session popover — before opening a terminal or proxy — so they have immediate context on the system state.

How it works:

  1. When the tunnel-client starts, it runs all diagnostic plugins in parallel with the WebSocket connection (no delay to the connection itself)
  2. After sending the service manifest, it waits up to --diagnostics-total-timeout (default 30s) for the plugins to finish, then pushes the aggregated report over a dedicated yamux stream
  3. The support service stores the report on the session; MY shows it in the session popover

Built-in plugin (system): always runs regardless of configuration. Collects:

  • OS name and version (from /etc/os-release)
  • CPU load averages (1m / 5m / 15m from /proc/loadavg)
  • RAM usage (from /proc/meminfo, warning >85%, critical >95%)
  • Root disk usage (warning >85%, critical >95%)
  • System uptime

External plugins: any executable file placed in /usr/share/my/diagnostics.d/ is run automatically. Files are executed in alphabetical order, each with its own timeout. This allows NS8 modules, NethSecurity, and third-party integrations to ship their own health checks independently.

Each plugin must:

  • Write a JSON object to stdout
  • Signal severity via exit code: 0 = ok, 1 = warning, 2 = critical
#!/bin/bash
# /usr/share/my/diagnostics.d/10-myservice.sh

STATUS="ok"
SUMMARY="all good"

if ! systemctl is-active --quiet myservice; then
  STATUS="critical"
  SUMMARY="myservice is not running"
fi

echo "{\"id\":\"myservice\",\"name\":\"My Service\",\"status\":\"$STATUS\",\"summary\":\"$SUMMARY\"}"
exit $([ "$STATUS" = "ok" ] && echo 0 || echo 2)

The overall session status shown in MY is the worst status across all plugins (critical > warning > ok). If a plugin exceeds its timeout it is marked timeout and does not block the others.

If --diagnostics-dir points to a directory that does not exist, only the built-in system plugin runs — no error, no configuration needed on systems that have not installed any plugins yet.

Environment variables

All flags can also be passed as env vars:

export SUPPORT_URL=wss://my-proxy-qa-pr-47.onrender.com/support/api/tunnel
export SYSTEM_KEY=<your-key>
export SYSTEM_SECRET=<your-secret>
./tunnel-client-linux-amd64

@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 11, 2026 14:06 — with Render Active
Show a clickable headset icon next to system name when an active support
session exists. The popover displays session status, dates, and connected
operators with per-node terminal badges. Backend now tracks terminal
disconnect times via access log lifecycle (insert returns ID, disconnect
updates disconnected_at).
…able rate limits

Refactor the tunnel-client from a single 1181-line main.go into organized
internal packages (config, connection, discovery, models, stream, terminal).
Rename traefik.go to nethserver.go with updated function names and log messages.
Replace YAML config with EXCLUDE_PATTERNS env var / --exclude flag for service
filtering. Improve api-cli error logging to include stderr output. Add
configurable rate limiting via env vars (RATE_LIMIT_TUNNEL_PER_IP,
RATE_LIMIT_TUNNEL_PER_KEY, RATE_LIMIT_SESSION_PER_ID, RATE_LIMIT_WINDOW)
with session limit raised from 100 to 500 req/min. Add build-tunnel-client
and run-tunnel-client Makefile targets.
Shift migrations to avoid conflict with 017_inventory_fk_set_null added on main.
@edospadoni edospadoni force-pushed the feature/support-service branch from f56f203 to 2fbfa44 Compare March 19, 2026 09:32
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 19, 2026 09:32 — with Render Active
@edospadoni edospadoni deployed to feature/support-service - my-frontend-qa PR #47 March 19, 2026 09:32 — with Render Active
At connect time, the tunnel-client collects a health report and pushes
it to the support service over a dedicated yamux stream. Operators see
the results in the session popover before opening a terminal or proxy.

Built-in system plugin always runs (CPU load, RAM, disk, uptime, OS
info). External plugins can be dropped as executables in
/usr/share/my/diagnostics.d/ - NS8 modules and NethSecurity can ship
their own health checks independently. Each plugin writes JSON to
stdout and signals severity via exit code (0=ok, 1=warning, 2=critical).

The overall session status is the worst status across all plugins.
Diagnostics run in parallel with the WebSocket connection to avoid
adding latency. A per-plugin timeout (default 10s) and a total timeout
(default 30s) prevent slow plugins from blocking the session.

- tunnel-client: new internal/diagnostics package (runner + models),
  built-in system check, DIAGNOSTICS yamux stream after manifest
- support service: acceptControlStream distinguishes DIAGNOSTICS header
  from manifest JSON, SaveDiagnostics() stores JSONB on session
- backend: GET /api/support-sessions/:id/diagnostics with RBAC scoping,
  migration 021 adds diagnostics + diagnostics_at columns
- frontend: diagnostics section in SupportSessionPopover with status
  dot and per-plugin summary rows
Operators can now inject arbitrary host:port services into a running
tunnel session without reconnection, enabling access to LAN devices
(IP phones, switches) through the support proxy.

- Backend: POST /support-sessions/:id/services with RBAC, validation,
  and Redis pub/sub dispatch (add_services action)
- Support service: SendCommandToSession() opens outbound yamux stream,
  writes COMMAND 1\n + JSON payload, waits for OK/ERROR
- Tunnel-client: accept loop pre-reads first line to route COMMAND vs
  CONNECT streams; thread-safe serviceStore with sync.RWMutex
- Frontend: Add Service modal with name/target/label/TLS fields; 1500ms
  delay before re-fetching services to account for async round-trip
- OpenAPI: documented new endpoint with Conflict response component
- README: added COMMAND stream table, Static Service Injection section
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 19, 2026 14:03 — with Render Active
@edospadoni edospadoni deployed to feature/support-service - my-frontend-qa PR #47 March 19, 2026 14:03 — with Render Active
Fixes 10 security issues identified in the pen-test review of the
static service injection and diagnostics features:

- SSRF bypass in applyAddServices (HMAC-signed Redis commands, server
  pre-check, and client-side validateTarget)
- Diagnostics JSON schema validation, 512 KB size cap, and DB-enforced
  rate limit across reconnections
- Diagnostic plugins rejected if not owned by root or writable by
  others; sanitized environment strips credentials
- host:port validation uses net.SplitHostPort with numeric range check
- DIAGNOSTICS stream version validated as exact "DIAGNOSTICS 1"
- serviceStore total cap (500) prevents unbounded growth
- Diagnostics goroutine starts only after yamux session is established
Remote apps (NethVoice, NethCTI) proxied through different subdomains
make cross-origin API calls that require CORS headers and shared cookie
authentication across sibling subdomains of the same support session.

Backend:
- Move CORS middleware from router to /api group so it does not
  intercept /support-proxy/* routes
- Add CORS preflight (OPTIONS 204) and response headers for
  same-session sibling subdomains (validated by session slug match)
- Scope proxy cookie to .support.{domain} with SameSite=Lax so it
  is shared across all service subdomains of the same session
- Remove per-service token validation: session ID match is sufficient
  since users have session-level access

Support service:
- Fix non-deterministic hostname rewriting in buildHostRewriteMap:
  when multiple services share the same original hostname, the current
  service's proxy subdomain is always preferred, keeping API calls
  same-origin and letting Traefik handle path-based routing
@edospadoni edospadoni force-pushed the feature/support-service branch from d683765 to 50624ac Compare March 20, 2026 10:43
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 20, 2026 10:43 — with Render Active
…er display

Add GET /api/support-sessions/diagnostics?system_id=X endpoint that returns
diagnostics for all active sessions of a system grouped by node, with an
overall_status reflecting the worst across all nodes. Update the frontend
popover to show collapsible per-node sections for multi-node NS8 clusters
while keeping the flat list for single-node systems.
@edospadoni edospadoni deployed to feature/support-service - my-frontend-qa PR #47 March 20, 2026 11:00 — with Render Active
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 20, 2026 11:00 — with Render Active
Tunnel-client creates temporary users when a session starts and removes
them when it ends, giving operators access to remote admin interfaces
without requiring customer credentials.

NS8: creates cluster-admin (Redis) + domain users per local LDAP/Samba
provider. Worker nodes fetch credentials from the leader via USERS_FETCH
yamux stream. NethSecurity: creates local admin user via nethsec Python
module.

Plugin system (users.d/): executable scripts configure applications for
the support user. The tunnel-client passes --instances-file with module
context (instances, domains, services) so plugins can configure per-
instance credentials.

Frontend: unified Services & Credentials modal replaces the old service
dropdown, showing cluster admin, domain credentials per module accordion,
and clickable service links.
- Fix re-discovery overwriting injected services: the serviceStore now
  tracks COMMAND-injected services separately and preserves them when
  periodic re-discovery replaces discovered services.

- Add remove_services COMMAND: tunnel-client removes injected services
  from its store and re-sends the manifest. The support service also
  removes them server-side immediately for instant API consistency.

- Add DELETE /api/support-sessions/:id/services/:name endpoint to
  remove custom services via the frontend.

- Rename "Other Services" to "Custom Services" in the frontend with
  a delete button (trash icon) for each custom service.

- Frontend: re-fetch services on modal open for fresh data.
- Make INTERNAL_SECRET mandatory at startup (fail-fast), remove
  fallback that accepted unsigned Redis commands and unauthenticated
  internal requests when secret was empty
- Add RBAC scope verification to CloseSupportSession and
  ExtendSupportSession to prevent cross-tenant session manipulation
- Clear ephemeral credentials (users JSONB) from database on session
  close, expire, and replace to limit credential exposure window
- Add HTTP server timeouts (ReadHeaderTimeout, IdleTimeout) to
  prevent slowloris denial-of-service attacks
- Re-validate service target DNS at proxy connection time to prevent
  TOCTOU DNS rebinding attacks (previously only validated at
  manifest registration)
- Move plugin temp files from /tmp to /var/run/my-support-tmp/ with
  0700 permissions to prevent inotify-based credential snooping
- Add PostgreSQL advisory lock on session creation to prevent race
  conditions when two tunnel-clients connect simultaneously
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 25, 2026 13:36 — with Render Active
@edospadoni edospadoni deployed to feature/support-service - my-frontend-qa PR #47 March 25, 2026 13:36 — with Render Active
@edospadoni
Copy link
Copy Markdown
Member Author

Plugin Systems: diagnostics.d/ and users.d/

The tunnel-client supports two extensible plugin directories. Both follow the same security model but serve different purposes. This comment is a reference for developers integrating new plugins.


diagnostics.d/ — System Health Checks

Path: /usr/share/my/diagnostics.d/ (configurable via DIAGNOSTICS_DIR or --diagnostics-dir)

Purpose: Each plugin reports health status of a specific subsystem. Results are aggregated into a diagnostics report visible in the support session UI.

Execution model:

  • Plugins are executables (any language: bash, python, Go binary, etc.)
  • Invoked with no arguments, no stdin
  • Must write a JSON PluginResult to stdout (or raw text as fallback)
  • Exit code determines status: 0 = ok, 1 = warning, 2 = critical, other = error
  • Timeout: 10s default (configurable via DIAGNOSTICS_PLUGIN_TIMEOUT)
  • Output limit: 512 KB stdout

Output format (JSON on stdout):

{
  "id": "my-plugin",
  "name": "My Plugin",
  "status": "ok|warning|critical",
  "summary": "optional summary text",
  "checks": [
    { "name": "check_name", "status": "ok", "value": "42", "details": "optional" }
  ]
}

If JSON parsing fails, the raw stdout text becomes the summary field and status is derived from the exit code.

Built-in plugin: system always runs (CPU load, RAM, disk, uptime) — it's hardcoded, not a file.


users.d/ — Application Configuration for Support Users

Path: /usr/share/my/users.d/ (configurable via USERS_DIR or --users-dir)

Purpose: After the tunnel-client provisions ephemeral support users (cluster-admin + domain users on NS8, local admin on NethSecurity), plugins in users.d/ configure applications to accept those credentials. For example, the nethvoice plugin creates a FreePBX ampusers entry so the support user can log in to NethVoice.

Execution model:

  • Plugins are executables invoked with two actions: setup and teardown
  • Arguments: <action> --users-file /path/to/users.json [--instances-file /path/to/instances.json]
  • --users-file contains the full SessionUsers JSON (credentials, platform, session info)
  • --instances-file is provided only if the plugin name matches a discovered NS8 module base name (e.g., plugin nethvoice matches modules nethvoice103, nethvoice104)
  • Timeout: 15s default (configurable via USERS_PLUGIN_TIMEOUT)
  • Output limit: 64 KB stdout

--users-file format (SessionUsers):

{
  "session_id": "uuid",
  "platform": "nethserver|nethsecurity",
  "cluster_admin": { "username": "support-neth-xxxx-yyyy", "password": "..." },
  "domain_users": [
    { "domain": "sf.nethserver.net", "module": "openldap1", "username": "support-neth-xxxx-yyyy", "password": "..." }
  ],
  "local_users": [
    { "username": "support-neth-xxxx-yyyy", "password": "..." }
  ],
  "module_domains": { "nethvoice103": "sf.nethserver.net" },
  "created_at": "2026-03-24T..."
}

--instances-file format (ModuleContext):

{
  "module": "nethvoice",
  "instances": [
    {
      "id": "nethvoice103",
      "node_id": "1",
      "label": "Main PBX",
      "domain": "sf.nethserver.net",
      "services": {
        "nethvoice103-wizard": { "host": "127.0.0.1", "path_prefix": "/nethvoice103-wizard", "tls": true }
      }
    }
  ]
}

setup output (JSON on stdout) — array of AppConfig:

[
  {
    "id": "nethvoice103",
    "name": "NethVoice (Main PBX)",
    "url": "optional direct URL",
    "notes": "Domain: sf.nethserver.net | Service: nethvoice103-wizard"
  }
]

These AppConfig entries appear in the support session UI so operators know which applications are configured and how to access them.

teardown: Called when the session ends (or tunnel-client shuts down). Must undo whatever setup did (delete users, revoke access). Stdout is ignored.


Shared Security Model (both plugin types)

Both diagnostics.d/ and users.d/ apply identical security checks before executing a plugin:

Check Rule
File type Must be a regular file (no symlinks, no directories)
Executable Must have at least one execute bit set (0o111)
Ownership Must be owned by root (UID 0) or the tunnel-client process UID
Write permissions Must not be group-writable or world-writable (0o022 mask)
Environment Plugins run with a minimal environment (PATH only) — no inherited secrets
Timeout Per-plugin timeout enforced via context.WithTimeout
Output limit Stdout capped (512 KB for diagnostics, 64 KB for users)
Temp files Credential files (--users-file, --instances-file) are written to /var/run/my-support-tmp/ (0700) and deleted after execution

If any check fails, the plugin is silently skipped with a log message.


Plugin Naming Convention (users.d only)

The plugin filename determines module matching:

  • If a plugin is named nethvoice, the tunnel-client checks if any discovered NS8 module has base name nethvoice (stripping trailing digits: nethvoice103nethvoice)
  • If a match is found, --instances-file is passed with all matching instances, their domains, labels, node IDs, and service routes
  • If no match is found, the plugin runs without --instances-file (useful for generic plugins that don't need module context)

This enables a single nethvoice plugin to configure all NethVoice instances on the cluster.


Example: Adding a New users.d Plugin

To add support user configuration for a new NS8 module (e.g., webtop):

  1. Create /usr/share/my/users.d/webtop (executable, owned by root, mode 0755)
  2. Handle setup and teardown actions
  3. Read credentials from --users-file (domain user matching the instance's domain)
  4. Read instance context from --instances-file (module instances, services, domains)
  5. Output AppConfig JSON array on stdout during setup

See examples/users.d/nethvoice in this PR for a complete reference implementation.

Add comprehensive developer reference for users.d/ plugins: --users-file
and --instances-file JSON formats, module name matching, AppConfig output
format, and example reference. Add shared plugin security model table
covering ownership, permissions, environment, timeouts, and temp file
handling for both diagnostics.d/ and users.d/ systems. Update credential
lifecycle with database cleanup. Add remove_services command, users/
directory, and examples/ to project structure.
@edospadoni edospadoni deployed to feature/support-service - my-frontend-qa PR #47 March 27, 2026 11:31 — with Render Active
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant